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Abstract 

This  paper  provides  a  new  framework  for  analyzing  white  noise  disturbances  in  linear  sys¬ 
tems:  rather  than  the  usual  stochastic  approach,  noise  signals  are  described  as  elements  in  sets 
and  their  effect  is  analyzed  from  a  worst-case  perspective. 

The  paper  studies  how  these  sets  must  be  chosen  in  order  to  have  adequate  properties  for 
system  response  in  the  worst-case,  statistics  consistent  with  the  stochastic  point  of  view,  and 
simple  descriptions  that  allow  for  tractable  worst-case  analysis.  The  methodology  is  demon¬ 
strated  by  considering  its  implications  in  two  problems:  rejection  of  white  noise  signals  in  the 
presence  of  system  uncertainty,  and  worst-case  system  identification. 


1  Introduction 

A  general  feature  of  mathematical  models  in  engineering  science  is  the  presence  of  modeling  errors, 
which  arise  due  to  poorly  understood  or  highly  unpredictable  phenomena,  or  from  simplifications 
deliberately  introduced  for  the  sake  of  model  tractability.  Essentially  two  approaches  are  available 
to  assess  the  consequences  of  this  error:  one  is  to  model  the  uncertainty  in  terms  of  a  set  of  allowable 
perturbations  and  perform  worst-case  analysis  over  this  set;  the  other  is  to  assign  the  additional 
structure  of  a  probability  measure  to  the  error,  and  perform  analysis  in  the  average. 

Uncertainty  is  often  the  dominant  issue  in  models  used  for  control  system  design.  These  models 
involve  substantial  approximations  (linearizations,  unmodeled  dynamics)  and  uncertain  parameter 
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values,  all  of  which  lead  to  systematic  modeling  errors  for  which  the  only  natural  characterization  is 
based  on  sets.  Also,  the  issue  of  stability  provides  an  incentive  to  take  the  worst-case  point  of  view. 
This  has  been  the  strategy  of  robust  control  theory,  which  has  developed  mathematical  tools  for  the 
evaluation  of  stability  and  performance  in  the  worst  case  over  sets  of  systems.  In  this  theory,  the 
methodology  based  on  sets  is  also  applied  to  disturbance  signals  (another  source  of  uncertainty), 
by  modeling  them  in  terms  of  a  ball  in  some  signal  space  (e.g.  C2,  £<x>)>  which  motivates  the  Wco  or 
£1  criteria  for  worst-case  disturbance  rejection.  The  main  motivation  for  these  disturbance  models 
is  mathematical  convenience,  since  these  performance  measures  can  be  directly  combined  with  set 
descriptions  of  system  uncertainty  to  analyze  robust  performance  (see,  e.g.  [15]). 

This  approach  for  disturbance  modeling  is  pessimistic,  however,  since  it  ignores  a  substantial 
amount  of  information  about  empirical  disturbances.  It  is  often  the  case  that  these  exhibit  broad¬ 
band  spectral  characteristics  ( white  noise ,  or  some  filtered  version),  especially  when  they  describe 
the  cumulative  macroscopic  effect  of  very  high  dimensional  fluctuations  at  the  microscopic  level. 
The  statistics  of  these  phenomena  have  been  very  accurately  modeled  by  the  theory  of  stochastic 
processes.  The  systematic  study  of  the  properties  of  dynamical  systems  under  stochastic  noise, 
pursued  by  stochastic  control  theory,  often  leads  to  tractable  results,  the  most  notable  being  the 
classical  H 2  (LQG)  problem.  The  main  limitation  to  its  applicability  is  that  noise  is  rarely  the 
prevailing  source  of  uncertainty,  and  the  others  do  not  fit  easily  into  a  stochastic  description  ([20] 
contains  some  work  in  this  direction). 

The  robust  performance  question  one  would  really  want  to  address  in  many  practical  cases  is  the 
effect  of  white  noise  over  sets  of  systems  (the  “Robust  7i.2n  problem).  Many  authors  (see  [25,  7]  and 
references  therein)  have  addressed  this  problem  in  terms  of  a  direct  combination  of  the  worst-case 
and  stochastic  frameworks,  and  have  succeeded  in  obtaining  upper  bounds  for  system  performance. 
At  this  time,  however,  this  approach  is  not  developed  to  a  competitive  level  with  other  performance 
measures  in  robust  control.  In  particular,  it  is  difficult  to  assess  the  conservatism  of  these  bounds 
since  they  involve  a  combination  of  worst-case  and  average  case  analysis. 

Another  example  of  the  difficulty  of  combining  these  frameworks  is  the  relation  between  robust 
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control  and  mainstream  system  identification  (as  in  [11]),  since  the  latter  relies  in  the  stochastic 
paradigm  for  noise.  Recent  efforts  in  pursuing  this  unification  in  the  worst-case  setting  have  once 
again  used  a  pessimistic  view  of  disturbances,  resulting  in  worst-case  identification  problems  with 
weak  consistency  properties  ([9,  26])  and  high  computational  complexity  ([6,  21]). 

In  this  paper  we  propose  a  new  methodology  for  white  noise  modeling,  aimed  at  resolving  these 
difficulties.  The  starting  point  is  the  following  question:  how  does  one  decide  whether  a  signal 
can  be  accurately  modeled  as  a  stochastic  white  noise  trajectory?  Deciding  this  from  experimental 
data  leads  to  a  statistical  hypothesis  test  on  a  finite  length  signal.  In  other  words,  one  will  accept  a 
signal  as  white  noise  if  it  belongs  to  a  certain  set.  The  main  idea  of  our  formulation  is  to  take  this 
set  as  the  definition  of  white  noise,  and  carry  out  the  subsequent  analysis  in  a  worst-case  setting. 

For  this  methodology  to  be  successful,  these  sets  should: 

•  Exclude  non-white  signals  (e.g.  sinusoids)  which  are  responsible  for  the  conservatism  of  the 
Hoo  and  C\  performance  measures. 

•  Include  likely  instances  of  white  noise.  Here  stochastic  noise  will  be  used  as  a  guidance  for 
the  choice  of  a  typical  set ,  but  not  for  average  case  analysis. 

•  Have  simple  enough  descriptions  to  allow  for  tractable  worst-case  analysis. 

The  paper  is  organized  as  follows:  some  notation  is  established  in  Section  2.  In  Section  3,  the 
case  of  signals  over  a  finite  horizon  is  considered,  and  set  descriptions  of  white  noise  are  given 
both  from  the  time  and  the  frequency  domain  points  of  view.  These  sets  are  analyzed  in  terms  of 
the  worst-case  system  response  and  in  relation  to  stochastic  noise.  Section  4  contains  the  infinite 
horizon  version.  In  Section  5,  the  application  of  this  framework  both  to  Robust  7f2  analysis  and  to 
worst-case  system  identification  is  outlined.  Space  limitations  preclude  an  extensive  development 
of  these  directions;  the  objective  here  is  to  show  the  potential  of  this  methodology.  The  conclusions 
are  given  in  Section  6,  and  some  proofs  are  covered  in  the  Appendix. 
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2  Assumptions  and  Notation 


We  will  consider  discrete  time,  causal,  linear  time  invariant  (LTI)  stable  systems  of  the  form 
H( A)  =  XlSo  where  A  is  the  shift  operator.  Most  of  the  results  will  be  presented  for  single 

input/single  output  (SISO)  systems;  for  the  multivariable  case  see  Section  4.3.  In  the  SISO  case 
we  will  assume  that  h(t)  G  l\,  this  implies  that  the  summation 

OO 

rh(T):='*jTh(t  +  T)h(t)  (1) 

t= o 

converges  for  each  r,  defining  the  autocorrelation  sequence  of  H ,  and  furthermore  that  r/j(r)  is  itself 
an  /i  sequence,  i.e.  X)rL-oc  \rh(T)\  <  00 •  The  frequency  response  (Fourier  transform  of  h(t)  €  h)  is 
denoted  by  H(e ■Ju’),  and  is  a  continuous  function  of  w.  The  Fourier  transform  of  r^(r)  is  the  power 
spectrum  Sh(u)  :=  \H{e^w)\2.  Also,  the  H2  norm  of  the  system  is  given  by 


00  1  /*  27T 

II#  111  =  r/>(°)  =  Yi  H{)2  =  5T  / 

t=o  •/o 


(2) 


For  some  of  the  frequency  domain  bounds  obtained  in  this  paper,  we  will  further  assume  that  Sh(w) 
is  a  function  of  bounded  variation  (in  BV[ 0,2x]).  This  means  (see  [22])  that 


TV(sh)  :=  sup  Y'  K0*>i+i)  -  5a(w»)I  <  00  (3) 

where  the  supremum  is  over  partitions  P  =  {uq, . .  .,wp}  of  the  interval  [0,2tt].  TV(sh)  is  the  total 
variation  of  Sh-  The  time  domain  condition  \T  r/i(r)l  <  00  is  sufficient  for  Sh(w)  G  BV[ 0,2tt]. 


3  The  Finite  Horizon  Case 

A  reasonable  starting  point  for  white  noise  modeling  is  the  case  of  a  scalar  valued,  finite  horizon, 
discrete  time  sequence  m(0), . .  ,,x(N  —  1)  of  length  N.  The  infinite  horizon  version  will  be  considered 
in  Section  4,  which  also  covers  the  extension  to  vector-valued  signals. 

To  analyze  the  response  of  a  system  with  memory  over  this  finite  horizon,  some  convention  must 
be  made  on  the  “past”  values  of  the  input  signals.  The  two  simplest  choices  are  either  to  assume 
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the  system  is  initially  at  rest,  or  that  it  is  in  periodic  steady  state  of  period  N.  We  will  adopt  the 

latter,  since  it  leads  to  a  more  tractable  spectral  theory:  the  sequence  x(0), . .  .,x(N  -  1)  will  be 

identified  with  the  periodic  signal  x(t)  of  period  N.  This  procedure  is  justified  for  analyzing  stable 

systems  with  time  constants  which  are  small  compared  to  N,  so  that  the  system  is  not  sensitive  to 

long  range  correlations  in  the  input  signals;  this  will  be  a  standing  assumption  in  this  section. 

The  discrete  Fourier  transform  (DFT)  X(k),  k  —  0  •  •  ■  N  —  1  of  the  sequence  x(t)  is  defined  by 

the  relations  7V-i  iV-i 

X{k)  =  x(t)e~^kt  ;  x(t)  =  j  J2  X(k)e^kt  (4) 

t=0  k= 0 

The  (circular)  autocorrelation  sequence  of  x  ( correlogram )  is  given  by 

TV-1 

rx(T )  =  X]  x(t  +  T)x(t)  T  =  0  •  •  •  N  -  1  (5) 

<= o 

and  the  sequence  power  spectrum  ( periodogram )  by  sx(k)  =  | X(k)\2,  k  —  0  •  ■  -N  —  1. 

The  sequences  rx(r)  and  sx(k)  form  a  DFT  pair.  For  an  iV-periodic  signal  x(t),  we  will  use  as 
norm  the  energy  over  the  period,  ||x||2  =  ^(0)  =  ^  Ylk=o  sx(k)- 
The  following  relations  follow  immediately  from  the  definitions. 

Lemma  1  Let  H  be  a  SISO  stable  system  (hit)  £  l\).  If  u(t)  is  an  N-periodic  input  signal  to  H , 
and  y  —  Hu  is  the  corresponding  steady  state  (periodic)  output,  then 

OO 

(*)  rv(T)=  rh(t)ru(t-T)  (6) 

t—  —  CO 
O7 r 

(H)  sy(k)  =  sh{—)su(k)  (7) 

3.1  White  Noise  Descriptions  in  the  Time  Domain 

We  wish  to  characterize  white  signals  among  sequences  of  length  N;  when  faced  with  the  problem 
of  deciding  whether  an  empirical  signal  is  a  sample  of  white  noise,  a  statistician  will  perform  a 
hypothesis  test  in  terms  of  some  statistic.  A  common  choice  (see  [2,  11])  is  the  sample  correlogram, 
which  should  approximate  the  expected  correlation  for  white  noise  (a  delta  function).  In  other 


5 


words  a  scalar  signal  is  x(t )  categorized  as  white  if  rx(r)  /  rx(ff)  is  small  for  r  in  a  certain  range 
(e.g.  1  <  t  <  T).  For  example1,  one  can  choose  to  specify  that  the  correlogram  (normalized  to 
rx(0)  =  1),  must  fall  inside  a  band  around  zero,  of  width  7,  as  depicted  in  Figure  1. 


Figure  1:  Correlogram  of  a  pseudorandom  sequence 


From  the  classical  statistical  point  of  view,  the  choice  of  7  is  associated  to  a  level  of  significance 
of  the  test,  which  in  turn  depends  on  some  stochastic  model.  But  regardless  of  the  reasoning  behind 
this  choice,  ultimately  the  “whiteness”  of  the  signal  is  decided  in  terms  of  whether  it  belongs  or 
not  to  a  parametrized  set.  This  motivates  the  following: 

Definition  1  The  set  of  signals  of  length  N  which  are  white  in  the  time  domain  sense  (accuracy 
7,  up  to  lag  T )  is  defined  by 

WNtljT  :=  {x  £  R"  :  \rx(r) |  <  777.(0),  r  =  1, . . . ,  T}  (8) 

The  response  of  an  LTI  system  to  signals  in  such  sets  will  now  be  analyzed  from  a  worst-case 
perspective.  The  worst  gain  of  the  system  under  signals  in  W/v^.r  (a  seminorm  on  systems)  will 
be  denoted 

\\h\\wn.,.t  •=  sip  {  H  >  *  =  Hu’  “  e  H'w.-t.t.  INI  *  °}  (9) 

1 A  common  alternative  is  to  bound  the  sum  of  the  squares  of  a  fixed  number  of  correlogram  values;  in  our  context, 
it  is  preferable  to  bound  the  maximum  deviation. 
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Theorem  2  Suppose  the  conditions  of  Lemma  1  hold,  and  u  G  Then 


ry(T ) 

T-u(O) 


r+T 

-  rh(r) |  <  7  X  M*)l  + 

t  =  r—T 
t^T 


X  M*)l 

|<-r|>r 


(10) 


Furthermore, 


l-ff|l^,T-|^ll22  <  7  E  MOI  +  E  M‘>l 

|t|>r 


t=-T 

<#0 


and  /or  id  F/7?  o/  length  T, 


t=—T 


(11) 


(12) 


Proof:  Equation  (10)  follows  immediately  from  Lemma  1,  and  the  definition  of  LE/v.^T-  Ap¬ 
plying  (10)  at  r  =  0  gives  (11).  The  upper  bound  in  (12)  follows  from  (11),  the  lower  bound  from 
the  fact  that  the  delta  function  is  always  a  signal  in  the  set  VE/v,- y,r- 

□ 

Remarks: 

1.  From  inequality  (10)  we  conclude  that  the  autocorrelations  of  y  (up  to  a  constant  factor  ||u||2) 
lie  in  a  band  centered  at  the  autocorrelations  of  the  filter.  Therefore,  such  a  band  is  a  natural 
set  description  for  colored  noise,  the  output  of  a  linear  filter  under  white  noise. 


2.  It  can  be  shown  (see  [16])  that  if  7  <  ^,  then  for  large  enough  N  the  upper  bound  in  (12) 
is  achieved.  This  is  no  longer  true  for  large  values  of  7;  for  example,  if  7  =  1,  there  are  no 
restrictions  on  the  input  signal,  and  the  induced  norm  can  be  bounded  by  the  norm  of 
the  system  which  in  the  FIR  case  is  equal  to 

SUp  77(0)  +  2  X  rh{T )  COS  LOT 
W  \  r=l  / 

and  is  in  general  strictly  less  than  the  bound  (12).  The  role  of  7  in  this  worst-case  approach 
is  to  parametrize  the  freedom  allowed  in  the  disturbance  signal,  and  results  in  a  worst-case 
gain  which  varies  from  the  7L2  norm  for  7  =  0  to  the  Hoo  norm  for  7  =  1. 
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Although  the  choice  7  =  0  would  give  a  clean  worst-case  theory  of  white  noise  rejection, 
it  would  mean  trading  the  pessimistic  disturbance  modeling  of  Hoo  for  an  overly  optimistic 
alternative,  since  a  realistic  finite  horizon  signal  will  not  have  exactly  zero  autocorrelations. 

3.  In  the  general  case,  the  parameter  T  also  plays  a  role,  and  its  adequacy  depends  on  the  time 
constants  of  the  system,  as  follows  from  (11).  The  case  T  =  N  —  1  is  considered  below. 

There  is  no  absolute  answer  as  to  what  is  a  “realistic”  white  noise  signal,  but  the  strongest 
motivation  for  these  disturbances  comes  form  high  dimensional  fluctuations  (e.g.  particle  agitation). 
These  have  been  classically  modeled  as  stochastic  processes,  but  could  also  be  interpreted  in  the 
context  of  deterministic  chaos  (see  [23]).  In  any  event,  stochastic  noise  is  known  to  provide  a  good 
model,  regardless  of  whether  the  probability  measure  is  due  to  chance  or  is  the  ergodic  measure  of 
a  chaotic  system.  Therefore,  a  natural  requirement  for  a  realistic  white  noise  set  W)v,7,x  is  that  it 
should  have  large  probability  for  stochastic  white  signals.  In  the  statistical  language,  this  refers  to 
the  level  of  significance  of  the  hypothesis  test  for  white  noise.  We  will  analyze  this  asymptotically, 
when  the  length  of  N  of  the  data  record  goes  to  infinity  and  7,  T  are  functions  of  N. 

Theorem  3  For  each  N  let  xN  =  (x(0), . . .,  x(N  —  1))  be  a  vector  of  independent,  identically 
distributed  random  variables,  with  zero  mean  and  finite  variance,  and  >  0. 

1.  If  T  is  fixed,  and  7^  y/ff  00,  then  V  (xN  E  Wjv,7,x)  A— >°  1. 

2.  If  the  x(t)  are  bounded,  and  7^  A— 00,  then  V(xN  E  Wn^,n~: 1)  A-T°°  \_ 

3.  If  the  x(t)  are  Gaussian,  and  ~fN  — 00>  then  V(xN  E  W./v7  jv-i)  1. 

lo3(N)7 

Remarks: 


®  Part  1  of  Theorem  3  follows  easily  from  well  known  results  on  asymptotic  normality  of  the 
correlogram:  there  is  substantial  averaging  between  the  length  N  of  the  time  series  and  the 
statistic  of  length  T  which  is  employed. 
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Parts  2  and  3,  with  T  set  to  N  -  1,  are  deeper  since  there  is  no  averaging:  we  are  imposing 
constraints  of  essentially  the  same  dimension  as  the  sample  length.  These  statements  are 
apparently  not  found  in  the  statistical  literature;  a  proof  is  given  in  the  Appendix. 

The  previous  theorem  has  provided  a  very  tight  “typical  set”  for  stochastic  white  noise:  we 
argue  that  for  many  purposes,  we  can  now  ignore  the  probability  measure  and  perform  worst- 

•  TV  ►  OO 

case  analysis  over  this  set.  One  such  case  is  disturbance  rejection:  by  choosing  ~yN  — »  0  at 
a  sufficiently  slow  rate,  we  find  that  the  set  Wjv,7,JV-i  has  asymptotically  probability  1  and  also 
jv-i  N-—^°  \\H\\2.  We  have  therefore  reinterpreted  the  H2  norm  (asymptotically)  as  the 
worst-case  gain  over  a  typical  set,  rather  than  the  average  gain.  Another  situation  where  the 
probabilistic  assumption  can  be  replaced  by  a  typical  set  is  in  the  context  of  system  identification, 
as  will  be  discussed  in  Section  5.2. 

Finally,  we  remark  that  this  approach  to  modeling  based  on  sets  can  be  applied  regardless  of 
any  stochastic  assumptions  on  the  noise  source:  what  matters  is  the  statistical  information  (which 
may  be  obtained  directly  from  empirical  correlograms),  not  the  generating  mechanism. 

The  main  pending  question  at  this  point  is  whether  the  chosen  sets  lend  themselves  to  tractable 
w'orst-case  analysis.  This  will  be  discussed  in  Section  5. 

3.2  Frequency  Domain  Descriptions 

As  the  name  implies,  a  white  signal  has  flat  distribution  of  energy  across  frequency,  which  in  the 
finite  horizon  case  would  correspond  to  a  fiat  periodogram  (the  DFT  of  a  delta-function  correlo- 
gram).  The  “raw”  periodogram  is  typically  very  erratic,  however,  as  demonstrated  in  Figure  2. 
This  fact  has  long  been  recognized  (see,  e.g.  [2,  5])  in  the  statistical  spectral  analysis  literature; 
correspondingly,  the  standard  methods  for  power  spectrum  estimation  are  based  on  smoothing  the 
periodogram,  by  some  form  of  local  averaging  that  reduces  the  fluctuations  (the  variance).  This 
smoothing  is  most  commonly  done  by  convolution  of  the  periodogram  with  a  window  function;  an 
abundant  literature  (see  [8])  has  studied  shapes  and  properties  of  these  windows. 
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Figure  2:  Periodogram  of  a  pseudorandom  sequence 


In  this  paper  we  are  interested  in  defining  a  set  of  typical  periodograms,  which  is  a  hypothesis 
testing  problem.  Of  course,  the  image  of  IFav/.T  under  the  DFT  is  such  a  set,  but  it  does  not  have 
a  simple  description  in  terms  of  the  frequency  domain  coordinates:  the  whole  purpose  of  using  the 
frequency  domain  would  be  defeated  with  that  description.  We  will  therefore  pursue  a  different 
characterization  for  the  frequency  domain  which  relies  entirely  on  periodogram  properties.  One 
alternative  is  to  specify  that  a  “windowed”  version  of  the  periodogram  be  flat  (this  was  pursued  in 
[16])  but  it  is  preferable  to  have  a  test  which  does  not  depend  on  a  choice  of  window. 

A  very  convenient  alternative  is  provided  by  the  Bartlett  cumulative  periodogram  test  (see  [2, 
8]),  which  consists  of  accumulating  the  periodogram  and  comparing  the  result  to  a  linear  function. 
Figure  3  contains  the  result  of  the  accumulation  process  on  the  periodogram  of  Figure  2.  As  we 
see,  the  fluctuations  have  been  smoothed  by  this  integration  and  the  result  approximates  a  linear 
function  in  a  uniform  sense;  this  is  the  essence  of  definition  which  follows. 


Definition  2 

by 


The  set  of  white  signals  in  the  frequency  domain  sense,  with  accuracy  rj  is  defined 


^TV,77  —  \  ^  G 


m— 1 


j  E  -  j  ik 


k— 0 


<  rj  ||a:||  ,  1  <m<N 


(13) 


We  will  now  support  the  frequency  domain  definition 


by  exhibiting  properties  which  parallel 
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Figure  3:  Cumulative  periodogram  and  bounds  for  Wn,ij 


those  in  the  time  domain.  The  worst-case  induced  norm  of  a  system  H  under  signals  in  the  set 
Wn,v  will  be  denoted  ||FT||^  . 

Theorem  4  Consider  a  stable  LTI  system  H ,  with  Sh(w)  G  BV[ 0,27t].  Then 

l|J|li-ll^ll^„,Js(^  +  >j)rv(s»)  (14) 

Proof: 

Fix  u  G  Wn,ti,  ||u||2  =  1.  Define  T(k)  by  r(0)  =  0,  r(m)  :=  jr  Y^k=o  su(k),  1  <  m  <  N .  Note 
that  T(iV)  =  1.  Let  y  =  Hu ,  and  for  simplicity  denote  Sh(k)  in  place  of  We  have 

TV-1  TV-1 

Hull2  =  jT  <»(*)»«(*)  =  E  ^C*)(V(*  + 1)  -  IW)  = 

k= 0  k= 0 

N- 1 

=  sh(N-l)+  ^(sh(k  -  1)  -  sh(k))T(k)  (15) 

k= 1 

Similar  calculations  show  that 

TV- 1  TV-1  , 

-  e  «(*)  =  mu  - 1)  +  E<«(*  - 1)  -  »»(*)) (is) 

Tc=0  fc=l 

From  (15),  (16)  we  obtain  (note  that  u  G  Wjv,?j  implies  |r(&)  —  <  r]  ) 

TV— 1  TV-1  , 

IW'-vE5***1  <  EW‘- U -»»(*)! |r(*)- Tfl< <jrv(«)  (n) 

k= 0  Ar=l 


Also,  by  bounding  the  difference  between  the  integral  \\H\\l  =  J027r  Sh(w)j^  and  a  step  function 
approximation,  it  follows  that 

ii-^iii  -  i  f  TV(Sh)  (18) 

k=0 

which  together  with  (17)  leads  to  (14). 

□ 

In  reference  to  the  properties  of  the  set  in  the  case  of  stochastic  noise,  these  have  been 

studied  in  the  statistical  literature.  We  state  the  following  result  (see  the  Appendix): 

Theorem  5  Let  x(0), . . . ,  x(N  —  1), . . .  be  independent,  identically  distributed,  zero  mean  Gaussian 
random  variables.  If  r/N  y/N  Oo>  then  V  ^(x(0), . .  .,x(N  —  1))  G  TT/v,??^  1. 

These  asymptotic  properties  show  that  the  frequency  domain  definition  is  adequate  from  the 
point  of  view  of  the  objectives  of  this  paper:  provided  rjN  o,  r]N\/N  qq  the  worst  case 

disturbance  rejection  measure  approaches  the  ?f2-norm  of  the  system,  while  the  class  of  signals 
contains  asymptotically  all  typical  instances  of  stochastic  white  noise.  Thus  the  families  of  time 
and  frequency  domain  sets  have  asymptotically  the  same  properties,  although  they  are  different  for 
any  fixed  N . 

4  The  Infinite  Horizon  Case 

The  role  of  infinite  horizon  signals  in  mathematical  modeling  is  that  of  an  abstraction  to  capture 
the  behavior  of  signals  and  systems  over  a  long,  but  unspecified  horizon;  the  chosen  mathematical 
framework  must  extend  naturally  the  finite  horizon  properties  and  lead  to  tractable  analysis. 

Two  frameworks  arise  naturally  for  the  study  of  deterministic  spectral  analysis:  b’ounded  power 
signals  and  bounded  energy  (I2)  signals. 
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4.1  Bounded  Power  Signals 

There  is  a  long  historical  tradition  in  a  non-stochastic  theory  of  white  noise,  going  as  far  back 
as  Wiener  (see  [28]),  who  considered  ergodicity  properties  to  build  a  spectral  theory  of  stationary 
signals  devoid  of  probability.  For  disturbance  rejection  problems,  this  approach  was  followed  in 
Zhou  et.  al.  [30],  who  considered  the  class  of  bounded  power  signals,  defined  by 

f  i  N  ) 

6P  =  jx(i)  :  r,(r)  =  to E  x (t  +  r)x(t)  exists  for  each  r  j  (19) 

This  class  is  well  motivated  since  it  includes  with  probability  one  trajectories  of  a  strictly  stationary 
ergodic  random  process.  Also,  similar  properties  are  obtained  in  the  context  of  deterministic  chaos. 

The  function  rx(r)  is  the  autocorrelation  of  the  signal,  and  the  power  ||a;||p  =  (^(0))^  plays 
the  role  of  a  seminorm  (with  some  restrictions,  see  below).  Also,  Bochner’s  theorem  (see  [4])  shows 
that  there  exists  a  spectral  distribution  function  Sx(oj),lj  £  [0, 2^ r]  such  that  rx(r)  is  recovered  from 
the  Stieltjes  integral 

(20) 

Equivalently,  there  exists  a  positive  spectral  measure  which  is  the  Fourier  transform  of  rx(r);  this 
allows  for  periodic  effects,  which  correspond  to  atoms  of  this  measure.  It  also  includes  the  case  of 
an  absolutely  continuous  spectrum,  with  the  corresponding  spectral  density  sx(lj)  =  dSx(u)/du. 

We  now  proceed  to  give  set  descriptions  of  white  noise  signals  in  BV,  motivated  by  the  finite 
horizon  definitions.  In  the  time  domain,  define 

W'/.T  :=  {x  £  BV  :  Mr)!  <  7^(0)  r  =  1, . . . ,  T)  (21) 

In  the  frequency  domain,  Definition  2  extends  by  comparing  the  cumulative  spectrum  Sx(u)  with 
a  linear  function: 

Wv  =  jz  £  BV  :  16^(0;)  -  !H|pw|  <  ri\\x\\2p  Vcj  £  [0,2tt]|  (22) 

One  can  also  consider  the  ideal  white  noise  set  WotOC  =  Wq  of  signals  with  autocorrelation  equal 
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to  a  delta  function,  flat  spectral  density.  In  fact,  this  class  contains  with  probability  one  trajectories 
of  stochastic  white  noise  (see  the  Appendix): 

Proposition  6  Let  x  =  (x(0), . .  . . .)  be  a  sequence  of  independent,  identically  distributed 

random  variables,  with  zero  mean  and  finite  variance.  Then  V  [x  G  Wo,oo)  =  1. 

We  now  turn  to  the  properties  of  a  stable  system  with  input  in  BV.  To  ensure  that  the  output 
is  a  BV  signal  poses  some  rather  technical  issues  which  we  will  not  address  here  (the  I2  setting 
considered  later  on  is  more  convenient  for  this).  For  the  moment  let  us  assume,  following  [30]2  that 
both  input  and  output  are  in  BV,  and  satisfy  the  basic  relations 


OO 


(0 

C/(r)=  Ys  rh(tW(t~T) 

f  —  —  oo 

(23) 

(*'*') 

dSy(io)  =  \H(en\2dSu(u) 

(24) 

The  worst-case  gain  in  power  for  signals  in  the  classes  IF-,  ,7  (or  Wv )  is  defined  by 

Wh\\w1iT{w„)  :=  sup{lbllp  :  u  e  wi ,t(wv),  \\u\\v  =  1}  (25) 

It  follows  immediately  from  (23)  that 

T 

11*111  <ll*ll»v,.<  prill +  7  E  Mt)I  +  E  MOI  (26) 

t=-t  |t|>r 

For  the  frequency  domain  case  assume  Sh  G  BV[ 0,2tt].  Consider  u  G  Wv,  ||u||p  =  1;  an  integration 
by  parts  yields 

r  27r  1  /■  2tt 

\\y\\v  =  2?  yo  sh(.u)dSu(u)  =  $h( 0)  -  —  J  Su(u)dsh(u> )  (27) 

Similarly,  \\H\\22  —  s^(0)  -  J27T u>dsh(co)  from  where 

1  rlTT 

\\y\\v  -  \\H\\l  -  2^  J0  \Su(u)  -  w\  dsh(u)  (28) 

2 [30]  states  that  if  u  €  BV,  u  6  loo,  and  the  system  is  exponentially  stable,  the  output  is  in  BV  and  (23-24)  hold 
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and  therefore 


mi  <  m\kT 


<ml  + 


yTV{sh ) 
27 r 


(29) 


In  particular,  the  system  H2  norm  can  be  motivated  as  the  gain  in  power  under  signals  in 
W0iOO  =  Wq,  or  equivalently  by  the  limit  norms 


1™  ll-ff lliv,  r ; 

7—0  T'-1 

T  — >oo 


(30) 


It  is  useful  to  compare  this  approach  with  the  one  used  in  [30].  The  induced  gain  in  power  is 
used  there  to  motivate  the  Hoo  system  norm;  in  contrast,  the  H2  norm  is  presented  as  a  “mixed- 
induced”  norm  using  different  seminorms  in  input  and  output  spaces  (power  in  the  output  space, 
and  a  spectral  seminorm  in  the  input,  based  on  the  peak  spectral  density). 

In  this  paper  we  use  power-to-power  for  both  cases:  for  the  H2  norm,  instead  of  changing 
the  input  seminorm,  we  constrain  the  inputs  to  belong  to  the  class  of  white  signals.  This  seems 
more  direct  (worst-case  gain  over  the  class  of  disturbances  one  expects  to  see),  and  allows  for  a 
comparison  of  the  H2  and  Hoo  system  norms.  The  main  advantage  of  our  formulation,  however, 
will  be  made  clear  in  Section  5,  where  whiteness  constraints  are  incorporated  in  robustness  analysis. 


4.2  I2  Setting 

Although  the  class  BV  is  conceptually  an  adequate  non-stochastic  framework  for  white  noise  signals, 
it  is  sometimes  inconvenient  due  to  its  little  mathematical  structure.  In  particular,  it  is  not  a  vector 
space  (not  being  closed  under  addition,  see  [12]),  so  it  is  not  a  seminormed  space.  For  this  reason 
we  will  now  consider  white  noise  descriptions  inside  which  has  the  structure  of  a  Hilbert  space. 

At  first,  this  seems  unnatural,  since  white  noise  signals  are  typically  considered  to  be  stationary, 
so  they  will  not  decay  as  time  goes  to  infinity.  As  far  as  characterizing  system  response  to  signals 
with  flat  spectrum,  however,  the  response  to  /2  signals  is  just  as  representative  as  the  response  to 
bounded  power  signals:  the  “behavior  at  00”  should  not  be  the  determining  factor  in  any  sensible 
engineering  model.  For  example,  the  response  of  an  LTI  system  to  a  bounded  power  signal  is 
approximately  the  same  as  the  response  to  a  very  long  truncation. 
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Actually,  the  same  considerations  apply  to  standard  Hoo  theory.  While  the  Hoo  norm  is  most 
naturally  motivated  [30]  by  the  gain  in  power  for  bounded  power  inputs,  since  this  class  includes 
sinusoids,  most  technical  results  on  'H<x>  are  obtained  by  using  /2  as  a  signal  space,  which  does  not 
contain  these  signals,  but  contains  signals  of  arbitrarily  narrow  bandwidth. 

For  /2  (square-integrable)  sequences,  the  autocorrelation  is  defined  by  rx(r)  =  ( x ,  \Tx).  The  cor¬ 
responding  spectral  measure  is  absolutely  continuous,  with  spectral  density  ^  =  |X(eJ'u;)|2, 
where  X(e^w)  is  the  Fourier  transform  of  x(t). 

The  sets  and  W n  over  /2  can  then  be  defined  as  in  (21)  and  (22);  the  same  properties  hold 
for  system  gain,  where  the  signal  norm  is  now  taken  to  be  the  f2  norm. 

4.3  Multivariable  Extension 

This  section  outlines  how  the  previous  methodology  can  be  extended  to  deal  with  vector  valued 
white  noise  signals.  We  will  only  consider  the  case  of  infinite  horizon  /2  signals,  which  demonstrates 
all  the  necessary  extensions;  the  same  ideas  could  be  applied  in  a  finite  horizon  setting. 

For  vector-valued  signals  x(t)  £  /2(Rn),  the  matrix  autocorrelation  (prime  denotes  transpose,  * 
denotes  conjugate  transpose)  is  given  by 

OO 

RX(t)  =  ^2  x(t  +  T)x'(t)  (31) 

t=— 00 

Once  again,  a  spectral  (matrix)  distribution  function  Sx(u)  is  defined,  verifying  a  matrix  version 
of  (20).  In  this  /2  case  =  sx(u>)  =  A(e-?u;)X*(e'?a'),  where  the  column  vector  X ( eJUJ )  is  the 

Fourier  transform  of  x(t).  The  2-norm  of  the  signal  verifies 

1  f27C 

\\x\\l  =  trace(Rx(  0))  =  —  /  trace(sx(eju,))dLO  (32) 

^  Jo 

Consider  a  stable,  discrete  time  linear  time  invariant  system  with  in  general  n  inputs  and  p 
outputs,  H{ A)  =  0  htyX*,  with  frequency  response  H(e^). 

Defining  Rh(t)  =  +  T)h'(t)  and  sh(lo)  =  H{e^w)H*{e^w),  the  Ff2  norm  of  H  satisfies 

(32).  If  u(t)  £  /2(Rn),  y(t)  £  /2( Rp)  are  respectively,  the  input  and  output  to  H ,  then: 
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00  00 

Ry(r)  =  +  t  -  s)h'(t) 

(33) 

0  s=0 

sy(u)  =  H(e’u)sx(u>)H(e’“)* 

(34) 

Now  we  give  set  descriptions  of  vector  valued  white  noise.  For  a  matrix  A  denote  HA^  = 
maXjj  and  define  the  following  set  in  terms  of  time  domain  constraints 


WlT  :=  x(t)  €  /2(Kn)  : 


Rx(t)-S(t)1- 


n 


<  T  ||*Hl ,  M  <  T 


(35) 


In  (35)  we  impose  low  autocorrelation  for  1  <  r  <  T,  and  also  low  “spatial”  correlation  between 
the  components  of  the  vector  signal.  The  choice  of  the  matrix  norm  in  (35)  is  somewhat  arbitrary; 
the  previous  choice  has  the  advantage  of  imposing  quadratic  signal  constraints  (see  Section  5). 

For  a  frequency  domain  characterization,  define 


w; 


:={*G/2(i2n):||5x( 


W)  ~  ~  |M| 2  In 
n 


<  vm\  ,  <*>  € 


[0,2tt]} 


(36) 


Defining  \\H\\WnT ,  ||Ff||^„  as  usual,  bounds  similar  to  (26),  (29)  can  be  obtained  from  (33), 
(34),  leading  to 


Remarks: 


bm  \\H\\WnT  -  lim\\H\\w„ 
T^oo 


(37) 


•  The  factor  ^  arises  from  the  use  of  the  same  norm  in  input  and  output  space.  It  can  be 
also  motivated  for  stochastic  noise:  if  the  input  has  covariance  matrix  I ,  the  expected  input 
power  is  y/n,  and  the  expected  output  power  is  ||ff||2. 


«  In  l j  space  there  are  no  ideally  white  signals  (Rx(t)  =  6(r)I,  or  Sx(w)  =  wl),  since  this 
would  imply  sx(u)  =  /,  and  it  is  a  rank  one  matrix  for  each  u>.  “Pure”  white  multivariable 
noise  appears  only  in  the  bounded  power  approach.  For  7  >  0,  rj  >  0,  however,  the  /£  sets 
IF7j<x,  and  Wv  are  non-trivial,  giving  arbitrary  approximations  to  white  noise  which  can  be 
used  via  (37)  to  motivate  the  H2  norm  within  the  I2  framework. 
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5  Worst-Case  Analysis  over  White  Noise  Sets 


The  previous  sections  have  provided  set  descriptions  of  white  noise  signals  aimed  at  worst-case 
analysis,  and  have  shown  that  this  procedure  is  sound  and  gives  results  which  are  consistent  with 
the  alternative  stochastic  setting. 

We  will  now  show  that  this  approach  leads  to  tractable  worst-case  analysis  by  showing  ap¬ 
plications  of  this  framework  to  two  different  problems  mentioned  in  the  introduction:  robust  H2 
performance  analysis  and  worst-case  system  identification.  We  will  not  attempt  to  present  a  full 
description  of  these  directions  in  the  limited  space  available  here;  they  have  been  developed  else¬ 
where  [18,  17,  19,  27],  and  journal  versions  are  in  preparation.  In  this  paper  our  objective  is  to 
provide  enough  evidence  that  this  methodology  has  useful  implications. 


5.1  Robust  Hi  Analysis 


A  problem  which  has  received  substantial  attention  (e.g.  [25,  7])  is  that  of  obtaining  robust  per¬ 
formance  guarantees  for  a  set  of  systems  subject  to  white  noise  disturbances. 


A 

G 

11  G 12 

— 

G21  G22 

v 

u 


Figure  4:  Uncertain  system 


In  the  system  of  Figure  4,  G  is  a  known  (nominal)  map  which  is  assumed  to  be  an  LTI  system. 
The  perturbation  A  represents  the  system  uncertainty,  which  is  assumed  to  have  block  diagonal 
structure,  of  the  form  A  =  diag[A\, . . .,  Ap],  (each  A ;  is  of  size  qi)  and  is  normalized  to  a  set 
:=  {A  :  jjAjj  <  1}.  For  background  and  motivation  for  this  setup,  see  [15]. 
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The  objective  is  to  analyze  rejection  properties  of  the  system  to  a  white  noise  disturbance 
applied  in  u,  in  the  worst-case  over  A  G  Ba-  If  the  perturbation  A  is  assumed  to  be  LTI,  this 
corresponds  to  finding  the  worst-case  W2  norm  of  the  closed  loop  transfer  function  from  u  to  y;  we 
will  mostly  deal  with  linear  time- varying  (LTV)  A’s  here,  but  still  refer  to  robust  W2  performance 
with  some  abuse  of  terminology. 

Set  descriptions  will  be  applied  to  describe  the  white  noise  disturbance  u;  as  argued  in  Section 
4.2,  it  is  sufficient  to  consider  the  sets  Wytx  or  Wv  inside  I2  space.  The  robust  performance  analysis 
problem  (e.g.,  for  is  therefore  to  compute 

sup  \\y\\2  (38) 

A£Ba 

u€W7ir,||i/||2<l 

Before  addressing  this  problem  we  review  how  this  question  can  be  handled  when  there  is  no 
constraint  on  u. 


5.1.1  Background  on  Robust  Hoo  Analysis 


The  robust  performance  question  most  commonly  treated  in  the  literature  deals  with  Hoo  perfor¬ 
mance,  wffiich  refers  to  the  worst-case  gain  of  the  system  as  an  operator  on  I2.  For  the  system 
in  Figure  4,  for  the  case  where  A  is  a  structured,  otherwise  arbitrary  linear  operator  on  /2?  the 
following  necessary  and  sufficient  condition  for  has  been  obtained  [24,  14]: 


sup  ||y||2  <1  3X  G  X  :  G(ejuj)* 

A6Ba 

IMI2<i 


X  0 

G(eJW)  - 

0  I 


X  0 
0  I 


<  0 


(39) 


In  (39),  X  is  the  set  of  positive  scaling  matrices  of  the  form  X  =  diag[xilqi , . . .  ,xpIqF\.  Since  X  is 
convex  and  so  is  the  condition  (39),  this  test  has  tractable  computational  properties  (see  [3,  15]). 

We  now  briefly  explain  how  this  result  can  be  obtained  from  the  Integral  Quadratic  Constraint 
(IQC)  formulation,  which  originated  in  the  work  of  Yakubovich  [29],  and  has  recently  been  applied 
to  this  Hca  problem  by  Megretski  [14]. 
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Let  5  be  the  vector  of  all  the  inputs  to  system  G,  z  =  (zi, . . . ,  zp,  zpp i),  where  (zi, . . . ,  zp)  is 
the  partition  of  the  signal  v  in  terms  of  the  blocks  of  A,  and  zp+ 1  =  u.  Analogously  (Gz)i,i  — 
1, . . . ,  F  +  1  denote  the  partitions  of  the  output  of  G. 

Now  define  the  following  scalar  valued  quadratic  functions  of  z  €  h, 

ai(z)=\\(Gz)l\\2-\\zi\\\  t  =  l,...,F+l  (40) 

The  main  observation  is  that  if  the  0i{z),  i  =  l,...,_F-f  1  are  all  non-negative  for  a  certain 
z  0,  then  G  expands  this  signal  in  every  channel,  and  therefore  contractive  LTV  perturbations 
Ai, . . . ,  A p  exist  such  that  the  closed  loop  is  expansive  at  z,  violating  robust  'HCX)  performance. 

Robust  performance  is  thus  converted  to  a  condition  on  the  sign  on  a  finite  number  , . . . ,  ctf+\ 
of  quadratic  forms  on  I2 .  (39)  now  follows  from  the  application  of  the  following  result  by  Megretski 
and  Treil  [14]:  Given  cr1? . .  .,crp+ 1,  where  each  cq  :  I2— is  a  shift  invariant  quadratic  form  in  I2, 
the  following  are  equivalent: 

1.  There  does  not  exist  zflj  such  that  >  0  i  =  1, . . . ,  F  +  1. 

2.  There  exist  aq  >  0,  i  =  1, . . . ,  F  +  1,  not  all  zero  such  that  aqoq  +  . . .  +  xp+itrp+i  <  0 

Note  that  the  only  non-trivial  direction  is  1=^2;  this  is  called  “S-procedure  losslessness”.  Applied 
to  (40),  condition  2  then  leads  to  (39),  with  X  =  diag[x\I , . .  .,xpl]/xp+ Some  refinements  of 
these  arguments  are  needed  (see  [14])  to  obtain  strict  inequalities,  and  xt  >  0. 

5.1.2  Robust  Performance  Analysis  over  the  Signal  Set 

We  now  show  the  robust  performance  problem  remains  tractable  when  the  disturbance  u  is  con¬ 
strained  to  vary  in  the  white  noise  set  W~hp.  We  consider  for  simplicity  the  case  of  scalar  noise, 
similar  arguments  apply  to  the  multivariable  case.  The  main  observation  is  that  is  described 
by  a  finite  number  of  constraints 

7ru(0)  ±  ru(r )  >  0  r  =  l,...,T  (41) 
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which  are  quadratic  on  the  signal  u.  In  other  words,  they  are  IQCs  (this  was  already  suggested  in 
[13])  corresponding  to  the  quadratic  forms  crf(u)  =  7ru(0)  +  ru(r),  cr~(u )  =  7ru(0)  -  ru(r). 

Using  the  same  arguments  as  in  the  case  of  Woo?  robust  performance  analysis  over  W7ix  reduces 
to  the  question  of  whether  there  exist  signals  26/2  verifying  simultaneously 

<Ti(z)  >  0,  i  =  1, . . .,  n  +  1  (42) 

vf{Zn+ 1)  >  0,  T  =  1,  .  .  . ,  T  (43) 

We  are  therefore  once  again  in  a  position  to  apply  the  losslessness  theorem  cited  above,  which  will 
imply  the  existence  of  non-negative  scalings  aq,  xf  satisfying 

n+i  T  T 

y,  xjUj  +  yt  x+ <7+ +y^yx~a~  <  o  (44) 

2  —  1  T  —  \  T  =  1 

The  previous  condition  is  once  again  convex  in  the  scaling  parameters  a q,  xf ,  which  suggests 
that  computational  methods  similar  to  those  for  robust  Woo  performance  analysis  should  result. 
These  issues  are  further  studied  in  [17,  18],  where  the  theory  is  developed  from  a  different  (though 
equivalent)  point  of  view.  The  idea  in  [17,  18]  is  that  IQCs  can  be  represented  in  implicit  form,  in 
terms  of  uncertain  equations.  Representing  the  system  also  in  implicit  form  reduces  the  problem 
to  a  standard  form  of  implicit  analysis  question  which  is  studied  in  detail  in  [17,  18].  In  particular, 
state-space  methods  are  available  to  compute  the  convex  condition  (44). 

An  important  remark  is  that  the  losslessness  result  of  [14]  only  applies  to  a  finite  number  of 
IQCs,  which  requires  T  <  oo.  As  7— >0,  T— >00  we  will  approach  the  robust  W2  performance  measure, 
but  convergence  in  T  could  be  slow,  leading  to  large  computations.  The  situation  is  better  for  the 
frequency  domain  sets,  as  is  shown  next: 

5.1.3  Robust  W2  Analysis  in  the  Frequency  Domain. 

Robustness  analysis  with  white  noise  signals  over  the  set  Wv  appears  to  be  a  more  complicated 
problem  since  Wn  is  not  described  by  a  finite  number  of  scalar  constraints:  although  the  integrated 
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spectrum  £„(u>)  depends  quadratically  on  u,  and  is  shift  invariant,  it  takes  values  on  a  function 
space,  namely  the  space  of  continuous  functions  in  [0, 2?r].  The  constraint 

\-^Su(u)  -  uj\  <  t]  VwG[0,27t]  (45) 

IMI2 

imposed  on  Su  in  the  definition  of  Wn  is  also  infinite  dimensional  in  nature. 

It  is  shown  in  [19],  however,  that  this  approach  leads  to  a  very  compact  solution  to  the  robust 
U2  analysis  problem.  Although  it  remains  infinite  dimensional,  its  form  lends  itself  to  simple  finite 
dimensional  approximations.  The  following  is  the  necessary  and  sufficient  condition  for  Robust  H2 
performance  (i.e.  robust  performance  over  Wv  for  small  enough  r/)  for  the  system  of  Figure  4: 
There  exists  X  £E  X,  X  >  0,  and  a  matrix  function  $(w)  G  CmXm,  $  =  $*,  such  that 


(*)  G{e?wy 


X  0 
0  I 


G{e^)  - 


X  0 

0 


<  0 


(46) 


f2ir  du 

(it)  /  trace(§{u>))—  <  0  (47) 

Jo  2?r 

The  previous  condition  is  stated  for  the  multivariable  noise  case  (n  is  the  dimension  of  the 
noise).  We  see  that  (46)  is  very  similar  to  condition  (39)  for  Robust  'H0o  performance;  the  only 
addition  is  the  incorporation  of  the  function  $(w),  which  plays  the  role  of  an  infinite  dimensional 
“multiplier”  corresponding  to  the  constraints  defining  Wv.  Heuristically,  for  n  =  1,  $(w)  allows 
for  the  gain  to  be  larger  than  1  at  some  frequency,  provided  that  it  is  compensated  at  some  other 
frequency  by  keeping  the  total  effect  f  $(u?)c?w  negative;  this  imposes  in  effect  an  average  over 
frequency  performance  which  corresponds  to  the  H2  norm. 

We  note  that  the  computational  properties  of  this  test  are  also  of  a  similar  nature  to  those 
of  (39).  Consequently,  analyzing  robust  H2  performance  is  essentially  no  harder  than  analyzing 
robust  Hqo  performance;  this  is  a  strong  result  which  is  shows  the  benefits  of  modeling  uncertainty 
and  disturbances  in  a  consistent  framework. 

A  proof  of  this  result  is  given  in  [19],  where  various  extensions  are  also  considered,  in  particular 
to  the  case  of  LTI  uncertainty,  involving  frequency- dependent  scaling  matrices  X . 
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5.2  Worst-Case  System  Identification 

The  classical  literature  on  system  identification  (see  [11]  and  references  therein)  characterizes  model 
errors  as  due  to  stochastic  noise;  system  identification  in  this  setting  is  a  special  case  of  an  estimation 
problem  in  statistical  inference.  From  this  perspective,  the  main  requirement  for  an  identification 
scheme  is  that  if  the  true  system  is  in  the  model  class,  the  estimates  are  consistent,  i.e.  they 
converge  to  the  true  values  in  a  stochastic  sense,  as  the  length  of  the  experiment  goes  to  infinity. 

In  contrast,  robust  control  theory  has  relied  on  error  models  based  on  sets,  e.g.  a  ball  of  systems 
in  some  norm.  The  desire  to  make  identification  and  robust  control  more  compatible  has  stimulated 
a  research  direction  (see  e.g.  [9,  26,  6,  21])  which  treats  the  system  identification  problem  from  a 
worst-case  point  of  view,  and  seeks  “hard”  bounds  on  the  identification  error.  In  this  formulation 
noise  plays  the  role  of  an  adversary;  if,  as  is  standard  in  robust  control,  it  is  allowed  to  vary  over  a 
large  class  (e.g.  a  ball  in  then  consistency  of  the  estimates  can  no  longer  be  ensured. 

We  now  discuss  these  issues  in  the  simple  situation  of  a  SISO  model  structure 

y  —  h  *  u  +  d  (48) 

where  h  —  (h( 0), . . . ,  h(T  —  1))  is  FIR,  and  d  is  noise.  Given  data  for  y,  u  of  length  N,  the  problem 
is  to  estimate  the  system  h.  The  equations  in  (48)  can  also  be  written  in  matrix  form  as  y  =  Wi  +  d, 
where  y,  h,  d  are  column  vectors  and  U  denotes  the  N  x  T  Toeplitz  matrix  with  first  column  u. 
The  2-norm  will  be  used  for  signals  here;  the  input  is  normalized  to  \\u\\\  =  N.  To  simplify  the 
analysis,  assume  that  the  experiment  was  started  at  time  — (T  -  1),  with  values  of  u  which  are 
Ar-periodic. 

In  the  classical  theory,  d  is  assumed  to  be  stochastic  white  noise:  IID  random  variables,  with 
zero  mean,  variance  o'1.  In  this  linear  regression  problem  the  minimum  variance  estimate  for  h  is 
given  by  the  least  squares  solution 

h  =  (. U*U)~lU*y  (49) 

where  invertibility  of  U*U  ( persistence  of  excitation )  is  assumed.  The  estimator  (49)  is  unbiased, 
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and  its  covariance  matrix  a 2(U*U)~l  will  converge  to  zero  as  N oo,  under  stationarity  assumptions 
in  u.  This  implies  that  in  the  stochastic  sense,  the  estimator  will  be  consistent. 

For  worst-case  identification,  we  first  follow  the  usual  approach  which  is  to  only  restrict  d  to  be 
bounded  in  norm;  suppose  ||c?| I2  <  S2N  (noise  to  signal  ratio  S).  Since  there  is  a  linear  relation  (48) 
between  h  and  d ,  the  set  of  h  values  compatible  with  the  data  and  the  constraint  ||d||2  <  d2N  will 
be  an  ellipsoid.  It  follows  that  if  one  wishes  to  minimize  the  maximum  error  in  the  2-norm  in  h, 
the  optimal  choice  is  the  center  of  the  ellipsoid,  which  once  again  corresponds  to  the  least  squares 
solution  (49).  Assuming  now  for  simplicity  that  u  is  purely  white  (i.e.  U*U  =  NI ,  this  is  also  the 
optimal  choice)  the  worst-case  estimation  error 

II*  -  h\\2  =  ||(zrzY)-1zrd||2  =  j  \\uu\\2  (50) 

has  a  value  of  6 ,  corresponding,  for  example,  to  d  =  Su. 

We  therefore  find  that  although  both  points  of  view  lead  in  this  case  to  the  same  optimal 
estimate,  they  attach  to  it  a  different  interpretation.  In  particular,  consistency  is  lost  in  the  worst- 
case  setting:  the  estimation  error  cannot  be  made  smaller  than  6,  no  matter  how  long  the  data 
record  is.  The  same  was  found  in  [9,  26]  for  other  system  norms.  The  reason  for  this  pessimistic 
interpretation  is  that  the  noise,  which  plays  an  adversarial  role,  is  allowed  to  vary  in  a  class  where 
it  can  “conspire”  to  have  a  high  correlation  with  the  input.  This  suggests  that  the  desirable 
consistency  interpretation  can  be  recovered  if  the  disturbance  is  constrained  in  the  style  of  this 
paper  to  have  low  cross  correlation  with  u. 

One  way  of  doing  this  was  studied  recently  by  Venkatesh  and  Dahleh  [27]:  the  input  is  chosen 
to  be  periodic  of  period  T  (this  allows  for  persistence  of  excitation  of  order  T),  and  the  disturbance 
d  is  restricted  to  the  set  Wjv^tv-i3-  The  main  observation  from  [27]  is  that  in  this  case  (assuming 
N  is  a  multiple  of  T)  , 

N- 1 

Wu*d\\l  =  rd(T)rI(.T)  (51) 

T  — 0 

3In  [27]  a  variation  of  this  set  is  used;  it  leads  nevertheless  to  similar  bounds  as  those  given  here. 
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where  r^(r)  is  the  correlogram  of  d  (length  N )  and  rj(r)  is  the  correlogram  for  u  of  length  T, 
repeated  periodically.  For  a  purely  white  u ,  we  would  have 


ru(T)  = 


T  for  r  =  kT,  k  =  0,...,f  -  1 
Or/  kT,  1  <  t  <  N  -  1 

Imposing  that  d  £  Wjv)7) jv-i,  (50),  (51)  and  (52)  give 

N/T—l 


(52) 


II*  -  h\\l  <  (  r„(0)  +  £  r^kT)  j  <  ^ N  (l  +  7(f  -  1))  <  (7  +  (53) 


We  now  consider  another  way  to  constrain  the  identification  problem,  which  is  to  directly  impose 
low  correlation  between  u  and  d.  For  example,  we  can  impose  that  ( u ,  d)  is  a  white  signal  in  the 
multivariable  sense.  More  precisely,  that  ( 6u,d )  (scaling  both  components  to  the  same  size)  is  in 
the  set  Wpj  „hT.  This  set  is  the  finite  horizon  version  of  the  set  W^T  of  (35)  and  in  particular 
imposes  the  cross  correlation  constraints 


8\(\Tu,d)\  =  \(\T6u,d)\<1(\\6u\\2 +  \\d\\2)  =  2162N  r  =  0,...,T  (54) 


Since  the  elements  oiWd  are  (A Tu,  d),  r  =  0, . . . ,  T  —  1,  these  bounds  can  be  applied  to  (50)  giving 

II h  -  h\\2  =  1  \\U*d\\2  <  j27SNVT  =  2 76Vr  (55) 

In  both  cases  ((53)  and  (55))  if  7W— >0  as  N— >00,  we  obtain  the  consistency  property  hN  h. 
By  choosing  an  appropriate  decay  rate  for  7  (e.g.  7  =  j§z,a  <  |),  the  chosen  disturbance  set 
has  high  probability  from  the  stochastic  viewpoint  (Theorem  3  applies  to  the  case  d  E  IF/v^./v-i;  a 
similar  argument  can  be  used  for  the  multivariable  case,  or  applied  to  the  constraints  (54)  alone). 
Therefore,  our  class  of  disturbances  is  still  rich  enough  to  accommodate  classical  identification. 
In  addition,  the  errors  in  (53)  and  (55)  will  decay  to  zero  in  polynomial  time  (in  contrast  to  the 
complexity  results  of  [6,  21]). 

As  those  in  [26,  6,  21],  these  results  for  FIR  identification  are  mainly  of  conceptual  value,  and 
contribute  to  understand  the  properties  of  the  identification  problem  from  a  worst-case  perspective. 
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However  they  provide  important  practical  guidelines  as  to  how  a  more  general  identification  problem 
should  be  posed  when  worst-case  guarantees  are  sought  (e.g.  identification  involving  noise  and  set 
descriptions  of  unmodeled  dynamics).  To  avoid  conservatism  the  disturbance  must  be  constrained 
explicitly,  and  correlation  constraints  are  an  adequate  tool  for  this.  These  more  general  problems 
are  currently  under  investigation. 

6  Conclusion 

As  a  field  of  engineering  science,  control  theory  has  a  broad  interaction  with  mathematics,  drawing 
on  tools  from  various  disciplines,  such  as  dynamical  systems,  algebra,  functional  analysis  and  prob¬ 
ability.  While  these  provide  a  variety  of  viewpoints  which  is  an  asset  of  this  field,  it  is  sometimes 
difficult  to  combine  the  positive  features  of  the  different  frameworks.  In  this  paper  we  have  suc¬ 
ceeded  in  addressing  one  such  situation,  providing  a  meeting  point  between  the  functional  analytic 
and  stochastic  points  of  view. 

Of  course,  many  problems  will  not  yield  to  this  kind  of  unification.  In  particular,  not  all  aspects 
of  a  stochastic  description  can  be  captured  by  worst-case  analysis  over  a  typical  set.  Nevertheless, 
we  feel  that  there  is  potential  for  further  applications  of  this  line  of  thinking  in  various  engineering 
problems,  which  naturally  call  for  a  combination  of  “hard  bounds”  and  probabilistic  models. 


Appendix 

This  section  contains  proofs  and  supplementary  material  for  the  stochastic  results. 

Proof  of  Theorem  3: 

Part  1:  For  the  case  of  a  fixed  time  lag  r,  the  distribution  of  the  autocorrelation  rx(r)  has 
been  extensively  studied  in  the  statistical  literature  [2,  1];  exact  expressions  for  the  distribution  of 
rx(T)/rx( 0)  when  x{t)  is  Gaussian  are  obtained  in  [1],  and  it  follows  that  yfN is  asymptotically 
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(56) 


normal  jV(0, 1).  Since  7-v /N  —>  00,  and  T  is  fixed, 

v(xt  wN„,T)  <  (j^r§;y|  >  v)  0 

□ 

In  parts  2,  3  of  the  theorem,  the  number  of  correlation  constraints  grows  with  the  sample  size,  and 
the  argument  with  the  normal  approximation  cannot  be  used:  even  though  each  rx(r)  for  fixed 
t  is  asymptotically  normal,  the  joint  distribution  of  (r^(l), . .  .rx(N  —  1))  is  defined  on  a  space  of 
increasing  dimension,  where  no  global  averaging  occurs.  Our  proof  relies  on  a  Hoeffding  inequality 
for  sums  of  bounded  random  variables,  [10]: 

Theorem  7  (Hoeffding)  Let  zo, . . .  zn-i  be  independent  random  variables,  of  mean  p  and  bounded 
by  a  <  zn  <  b,  define  z  =  jj  ^2n=o  zn-  Then  for  e  >  0, 

-2Ne2 

V(z  -n  >  e)  <  e(»~°)2  (57) 

We  want  to  apply  this  inequality  to  the  sum  rx(r)  =  ^2^=0  zi^)i  z(t)  =  x(t)x((t  +  r)modiV), 
and  x(0) . .  .x(N  —  l)  independent,  identically  distributed.  The  z(t )  are  not  independent  (as  required 
in  Theorem  7),  but  their  dependence  is  very  slight,  so  the  sum  can  be  reduced  to  three  sums  of 
independent  variables,  as  shown  in  the  following  sequence  of  Lemmas. 

Lemma  8  Let  {ai,...,ajv}  be  a  permutation  of  {1, . . . ,  N}.  Then  the  set  of  ordered  pairs  S  = 
{(l,Oi), . .  -,(N,aN)}  can  be  partitioned  into  three  disjoint  sets  S\,  S 2,  S3,  of  respective  cardinality 
ni}  ^3,  such  that 

1.  No  two  pairs  which  fall  in  a  single  Si  have  a  common  element  of  {1, . . .,  N} 

(i.e.,  if  (n,  an),  (m,  am)  £  S{,  n  ^  m,  then  n  ^  am  and  m  ^  an). 

2.  m  >  f ,  i  =  1,2,3 
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Proof:  We  perform  the  classification  by  induction.  For  a  given  n,  assume  that  the  pairs 
(1,  ai), . . . ,  (n,  an)  have  been  classified  in  disjoint  sets  sjn\  S^,  S ^  which  satisfy  condition  1. 
Now  consider  a  new  pair  (n  +  l,ara+i)-  Since  there  are  at  most  two  pairs  in  S  with  an  element  in 
common  with  (n  +  1,  a„+i),  at  least  one  of  the  three  Sfn'1  will  have  none  of  these  pairs  and  therefore 
condition  1  is  maintained  if  (n+  1,  an+i )  is  added  to  it.  This  implies  by  induction  that  it  is  possible 
to  partition  S  into  sets  Si,  S2,  S3  satisfying  condition  1. 

Now  consider  their  cardinalities  n\,  n2 ,  «3.  Assume  that  2 ft;  <  nj  for  some  i,j.  Since  there  are 
at  most  2 n2-  elements  in  the  pairs  of  Si,  and  Sj  has  more  pairs ,  then  at  least  one  pair  in  Sj  shares 
no  elements  with  those  of  Si.  Therefore  this  pair  can  be  moved  to  Si,  maintaining  condition  1. 
Repeating  this  procedure  will  lead  to  a  partition  Si ,  62,  53  satisfying  condition  1  and  2 m  >  rij  Vi,j. 
If  «i  is,  for  example,  the  minimum  of  the  nt,  then  N  =  ni  +  n2  +  «3  <  +  2ni  +  2«i  =  5«i 

which  implies  condition  2  is  satisfied. 

□ 

Lemma  9  Let  N  >  3,  and  x(0),  x(l), . . .,  x(N  —  1)  be  independent  identically  distributed  random 
variables.  Fix  1  <  r  <  N .  Then  rx(r)  can  be  expressed  as  rx{r)  =  Ex  +  S2  +  S3,  where  each  Si  is 
the  sum  of  ni  independent,  identically  distributed  random  variables,  and  ni>  S-. 

Proof:  For  the  permutation  ai, . .  .,aN  given  by  the  circular  shift  an  =  (n  +  rjmodA,  perform 
the  classification  into  sets  Si,  S2,  53  of  Lemma  8.  Then  for  each  i  choose 

Si  . —  ^  '  XnX an 

(n,an)£Si 

By  construction  of  the  sets  St,  the  terms  in  the  sum  (58)  are  independent,  identically  distributed. 

□ 


Now  we  return  to  the  rest  of  Theorem  3. 

Part  2:  Assume  ®(0), . . . ,  x(N  —  1)  are  bounded  random  variables,  |x(f)|  <  K .  Pick  1  <  r  <  N. 
From  Lemma  9,  rx(r)  =  Ex  +  S2  +  E3,  where  each  Et-  is  the  sum  of  n,-  independent,  identically 
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distributed  random  variables,  with  zero  mean  and  bounded  in  [—it2, it2].  Invoking  Hoeffding’s 


inequality  and  nt-  >  y,  we  have 


rx(r) 

N 


>  e  < 


Vpf5i>d  <Ye=£r<3e 

h  v»,-  )-£( 


-Ne* 
10  X4 


(59) 


The  same  argument  can  be  employed  to  bound  V  (~pp  >  P ,  for  each  value  of  t.  This  implies 

fi  =  6e£'°3(/V)  (1_  10 K^Log(N)  )  (60) 


P  (  max  >  e  )  <  6 N e 


\1  <t<n  N 

Now  choose  0  <  p  <  1.  The  complement  of  Wn,^,n- i  can  be  written  as 

=  {iSa<»  w *  7} c  {““«  ^  =■ w}  u  {  V  % x{tf  <  (61) 

The  probability  of  the  first  set  is  bounded  by  (60),  setting  e  =  qp.  The  probability  of  the  second 
set  can  be  bounded  by  another  use  of  the  Hoeffding  inequality,  applied  to  the  bounded  IID  random 
variables  x{t)2.  Putting  everything  together, 

V  (Wg„t at— a)  <  6ei05(7V)(1_I^W))  +  e~?Nih~pf  (62) 

The  second  term  clearly  goes  to  to  zero  as  N—> oo,  and  the  same  happens  with  the  first  term  since 


by  hypothesis  7 N  xf 


N  iV— >oo 


log(N) 


OO. 


Part  3:  Assume  ar(0), . .  .,x(N  —  1)  are  Gaussian  random  variables,  x(t)  ~  A/"(0, 1).  Choosing 
K(N )  =  -\/2Log(N),  define  the  random  variables  v(t),t  =  0, . . .,  N  —  1  by  truncation: 


v{t)  =  { 


x(t)  if\x(t)\  <  K(N ) 
0  otherwise 


nc,  -kin)2  r 

V{x^v)<NV  (x(t)  ?  v(t))  =  NV  (KOI  >  K{N))  <  — 


(63) 

(64) 


In  (64)  x  —  (*(0), . . . ,  x(N  -  1)),  v  =  (v(0), . . . ,  v(N  —  1)),  and  the  second  inequality  follows  from 
a  standard  bound  to  the  tail  of  the  normal  distribution  (C  is  a  constant).  Observing  that 


'P  {x  &  Wjv,7,jv- 1)  <  V  (v  &  Wn,~/,n-i)  +  V(x  ^  v) 


(65) 


29 


it  remains  to  show  that  V  (v  £  Wjv,7,jv- 1)  also  vanishes  as  N oo.  Since  the  variables  v(t)  are 
bounded  by  K(N),  (62)  gives 


V(v  g  Wjv,7,n-i)  <  6e 


Log(N)(l  10K47ioj(w)j 


-2NO-pr 

+  e  k* 


(66) 


The  second  term  clearly  has  limit  0  as  jV— ►  oo.  The  first  term  also  goes  to  0,  since  by  hypothesis 

N^2p2  p2  N' y2  j.  •  r*  •  <_ 

K*Lof(N)  =  ^L^N?  §°eS  t0  mfimty- 


Remarks  on  the  Proof  of  Theorem  5 


The  fact  that  a  uniform  bound  is  being  applied  to  the  cumulative  periodogram  means  that  we  are 
imposing  a  number  of  constraints  of  the  order  of  the  sample  size,  as  in  Theorem  3  parts  2,  3;  this 
again  precludes  simple  arguments  based  on  averaging. 

The  key  observation,  which  led  Bartlett  (see  [2])  to  propose  this  test,  is  to  notice  that  the 
stochastic  properties  of  the  cumulative  periodogram  are  similar  to  those  used  for  tests  on  empirical 
distribution  functions.  The  maximum  deviation  between  an  empirical  distribution  and  the  true 
distribution  function  forms  the  basis  of  the  Kolmogorov- Smirnov  test  (see  [4]),  which  has  well  known 
asymptotic  properties.  The  connection  with  the  cumulative  periodogram  can  be  seen  as  follows: 
in  the  case  of  Gaussian  white  noise,  the  periodogram  values  are  independent  and  exponentially 
distributed  (see  [5]),  which  implies  ([4],  Prop.  13.15)  that  the  normalized  cumulative  periodogram 
values  =  -  |^yp  Yl’kZi o'  sx(k)  have  the  same  joint  distribution  as  an  ordered  sample  of  uniform 
(0, 1)  variables.  From  these  arguments  it  follows  that 


VN  sup 

1  <m<7V 


i 

1V||*|[2 


m— 1 

sx(k) 


k= 0 


m 

N 


converges  in  law  to  a  fixed  distribution.  Since 


TV— >oo 


0,  then 


1 

—  sup 
Vn  l<m<N 


,  m— 1 

E  *.<*>-£ 


n  WAY 


which  proves  the  theorem. 
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An  additional  remark  is  that  although  this  proof  is  valid  for  Gaussian  noise,  there  is  indication 
in  [2]  that  the  asymptotic  properties  are  insensitive  to  the  noise  distribution. 

Proof  of  Proposition  6 

For  a  fixed  r  /  0,  referring  to  [4]  (proposition  6.31),  we  find  that  the  random  process  z{t)  = 

x(t)x(t  +  r)  is  ergodic,  so  with  probability  1, 

,  * 

A^o  2 N  +  1  2  +  r)x(^  =  E ^  +  =  0  (67) 

Therefore  Wo,co  has  probability  1  (countable  intersection  of  probability  1  sets). 
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